Maximizing TLP with loop-parallelization on SMT

نویسندگان

  • Diego Puppin
  • Dean Tullsen
چکیده

This paper describes research in exploiting loop-level parallelism on a simultaneous multithreading processor. We discuss some general and ad-hoc techniques for loop parallelization that proved to be e ective with SMT, and how they were tuned for it. These techniques have been tested on the well-known Livermore loops, chosen for their variety of behaviors. The set of optimizations used produced signi cant improvement overall: we were able to improve average IPC from 2.72 to 3.97, and to gain an average speedup of 1.39 over optimized single-thread code, using up to eight threads. We also describe a simple but e ective method for determining the best number of threads to be used for parallel loops on a multithreaded processor. The model uses compile-time information to predict the most eÆcient point.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the Performance Potential of Function Level Parallelism

Because of technology advances, current trend in processor architecture design focuses on placing multiple cores on single chip instead of increasing the complexity of single core processors. These upcoming processors are able to execute several threads in parallel, which make them a suitable platform for the application of automatic parallelization techniques. Most of the research efforts conc...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threa...

متن کامل

LoopProf: Dynamic Techniques for Loop Detection and Profiling

As processors transition to multithreaded, multi-core designs, sequential programs will no longer achieve historical performance gains from advances in technology. This trend places a greater responsibility on programmers and software for program optimization. Vectorization and threadlevel parallelism (TLP) will be increasingly relied upon in addition to instruction-level parallelism (ILP) and ...

متن کامل

Mini-Threads: Increasing TLP on Small-Scale SMT Processors

Several manufacturers have recently announced the first simultaneous-multithreaded processors, both as single CPUs and as components of multi-CPU chips. All are small scale, comprising only two to four thread contexts. A significant impediment to the construction of larger-scale SMTs is the register file size required by a large number of contexts. This paper introduces and evaluates minithread...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001